Independent component analysis algorithms for non-invasive fetal electrocardiography

The independent component analysis (ICA) based methods are among the most prevalent techniques used for non-invasive fetal electrocardiogram (NI-fECG) processing. Often, these methods are combined with other methods, such adaptive algorithms. However, there are many variants of the ICA methods and it is not clear which one is the most suitable for this task. The goal of this study is to test and objectively evaluate 11 variants of ICA methods combined with an adaptive fast transversal filter (FTF) for the purpose of extracting the NI-fECG. The methods were tested on two datasets, Labour dataset and Pregnancy dataset, which contained real records obtained during clinical practice. The efficiency of the methods was evaluated from the perspective of determining the accuracy of detection of QRS complexes through the parameters of accuracy (ACC), sensitivity (SE), positive predictive value (PPV), and harmonic mean between SE and PPV (F1). The best results were achieved with a combination of FastICA and FTF, which yielded mean values of ACC = 83.72%, SE = 92.13%, PPV = 90.16%, and F1 = 91.14%. Time of calculation was also taken into consideration in the methods. Although FastICA was ranked to be the sixth fastest with its mean computation time of 0.452 s, it had the best ratio of performance and speed. The combination of FastICA and adaptive FTF filter turned out to be very promising. In addition, such device would require signals acquired from the abdominal area only; no need to acquire reference signal from the mother’s chest.


Introduction
Fetal electrocardiography (fECG) is a method based on sensing electrical activity of fetal heart in the form of electric potentials. The fECG recording provides clinically important information about the fetus' medical condition and can be used timely to identify congenital heart defects (such as fetal arrhythmia or fetal atrioventricular block), but most importantly to timely identify fetal hypoxia [1,2].
Reduced oxygen and blood supply to fetal brain can cause hypoxic-ischemic encephalopathy (HIE), resulting in long-term neurological disorders like mental impairment and cerebral a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 palsy [3]. Accurate monitoring of fetal oxygen and blood supply is crucial to prevent these healthcare concerns. Diagnostic tools are continuously being developed for fetal surveillance. The most used methods are fetal heart rate (fHR) monitoring, fetal scalp blood sampling, ultrasound, and magnetic resonance imaging [4]. Fetal heart rate monitoring is particularly important during the labor. Currently, cardiotocography (CTG) is the most prevalent method for non-invasive electronic monitoring worldwide, as it allows concurrent monitoring of fetal heart rate (fHR) and uterine contractions [5]. However, it has several disadvantages, including a high rate of false positive results, which may lead to unnecessary interventions such as emergency caesarean sections [6,7]. Therefore, the current research efforts focus on finding alternative methods for non-invasive fHR monitoring, which would be able to detect fetal hypoxia with higher reliability.
One of these methods is fECG, which is able to provide additional information about the fetal health state besides fHR, which is determined using the most prominent peaks of the ECG waveform, the R waves [8]. With fECG, it is possible to accurately identify adverse events during labor (hypoxic conditions) manifested by morphological changes to the fECG signal, particularly ST segment [9]. It can also help in determining other haphazard states during the pregnancy based on the changes of other parts of the waveform, such as QT interval [10][11][12].
Monitoring with fECG can be done by invasive sensing with a scalp electrode or by noninvasive sensing on the surface of the mother's abdomen. The invasive variant yields fECG signal with very high quality, there is however a risk of introducing an infection into the uterus, and the variant as such may be carried out only during delivery, after the rupture of membranes. The non-invasive variant is much safer compared to the invasive fECG and may be used both during pregnancy and during delivery [1,5]. Moreover, unlike CTG, it allows a truly long-term continuous monitoring, as the method is completely passive and neither the mother nor the fetus are exposed to ultrasound energy. Also, electric signals coming from the uterus during contractions can be acquired in addition to the fECG signal. The advantage of non-invasive fECG (NI-fECG) is the possibility to incorporate it into a device for continual remote home monitoring (fetal Holter), particularly useful in high-risk pregnancies or in suspected arrhythmias [13]. This would allow the doctor to remotely monitor the medical condition of the mother as well as of the fetus in real time and the number of actual visits to the doctor could be reduced [14]. Another advantage of NI-fECG is the ease of use and the fact that its quality does not depend much on the operator's experience.
The disadvantage of the non-invasive variant is that the ECG acquired through abdomen (aECG) is a mix of useful signals, fECG, maternal ECG (mECG), and noise (motion artifacts, myopotentials, isoelectric line fluctuations) [1], see Eq (1), where n denotes the number of samples of each vector.
In addition, the mECG amplitude is usually higher than the fECG amplitude and the mECG signal has almost identical range from the perspective of time and frequency as the fECG and its elimination requires advanced extraction algorithms or combinations thereof [15,16].
The following section includes a review of the state-of-the-art methods for fECG extraction, with particular focus on the hybrid systems. The aim of this article is to identify and test the most promising methods for fECG extraction in terms of the performance, implementability and computational speed.

State-of-the-art fECG extraction methods
Various algorithms for fECG extraction were tested in the past, among which are techniques based on blind source separation (BSS) [17], wavelet transform (WT) [18], empirical mode decomposition (EMD) [19,20], template subtraction [21] or adaptive algorithms [15,16]. The most promising results for fECG extractions acquired in our prior studies [15,16] were achieved with adaptive algorithms, which used aECG and mECG signal inputs. The mECG signal was modified by the adaptive algorithm so that it corresponds to the maternal component of the aECG signal as much as possible. This modified mECG could be then subtracted from the aECG to acquire the fECG signal as a result.
In general, there are two approaches to acquiring aECG and mECG signal in a non-invasive manner. The first approach is based on sensing aECG signal from abdominal area of the mother, which is used as the primary input, while a reference mECG signal is acquired from the mother's chest area. However, the mECG signal acquired from the chest has different morphology than the maternal component of the aECG signal and is usually of poor quality and the extraction using the adaptive algorithm is therefore less effective. The measurement process is also less comfortable for the mother and reduces her mobility.
Therefore, we used the second approach to acquire the aECG and mECG input signals. It is based on sensing aECG signal only and on BSS algorithm, which is able to reliably estimate mECG and aECG signal with enhanced fetal component (marked aECG*) from a mix of abdominal signals [15,16]. Both extracted signals can be used as an input signal for adaptive algorithm, which is then used to modify the mECG signal according to the maternal component in the aECG* signal. The modified signal is then subtracted from aECG*, thus creating a resulting fECG signal (example of aECG input signals, aECG* and mECG components estimated by the ICA-based method and extracted fECG signals is shown in Fig 1).
Since the performance of the adaptive algorithm depends on the quality of input signals, selecting a suitable BSS algorithm is important and has a direct effect on the quality of the estimated aECG* and mECG. Most BSS methods use second-and higher-order statistics, while the source signals and the mixing process are unknown. There are various BSS-based approaches which were tested in the past for fECG extraction. Among those are the most commonly used independent component analysis (ICA) [17,[22][23][24][25], principal component analysis (PCA) [22,25], or non-negative matrix factorization (NMF) [2,19].
• A comparison of effectiveness of the kurtosis-based FastICA method, negentropy-based Fas-tICA, joint approximate diagonalization of eigenmatrix (JADE) algorithm and PCA for extracting fECG were presented in [22]. The results were evaluated objectively by using the signal-to-noise-ratio (SNR) parameter and subjectively by comparing the waveforms of extracted signals. According to the authors, the best results were achieved with JADE, but the FastICA method proved effective from the perspective of computation time.
• Periodic component analysis has been designed and tested for fECG extraction in [26]. No statistical results were presented, and the extracted signals were merely subjectively evaluated. However, the authors believe that the method was effective from the perspective of both, performance and time. In addition, the assumption of periodic component analysis based on the aperiodic temporal structure criterion was more reasonable in connection with ECG than the assumption of the conventional ICA based on the independence criterion.
• In [23], the authors designed and tested non-parametric ICA based on the kernel density estimation method. The method was tested on synthetic records and the extraction quality was evaluated by means of SNR parameter. Non-parametric ICA managed to extract the fECG even in those records where FastICA and JADE methods failed. The disadvantage of this method is increased computational complexity.
• A comparison of ICA-based methods: second-order blind identification (SOBI), algorithm for extraction of multiple unknown sources (AMUSE) and eigenvalue decomposition was carried out in [24]. Real records were used to test the methods and the results were compared using the sensitivity (SE) and positive predictive value (PPV) parameters. The best results of fECG extraction were achieved with SOBI method (SE = 75.1% and PPV = 69.7% in off-line extraction and SE = 59% and PPV = 46% during on-line extraction).
• A combination of ICA and PCA method was tested on real records in [25]. The ICA method turned out to be effective for the enhancement of fECG and the PCA method managed to effectively suppress mECG. However, the authors recommended validation of this method on signals from women in earlier stages of pregnancy.
• The NMF method was designed for fECG extraction in [2] and tested on real records. Its performance was then compared to the ICA. The methods were tested directly on the original aECG input signals, on aECG signals in compressed domains, and in the recovered aECG after compression. The quality of extraction was evaluated using SE, PPV, and F1 parameters. Based on the F1 parameter, the NMF method proved to be a better solution when used for original aECG signals (NMF yielded an mean of F1 = 94.8%, while ICA yielded F1 = 93.6%) and for recovered aECG (NMF yielded an mean of F1 = 96.75%, while ICA yielded F1 = 95%). On the other hand, ICA turned out to be more suitable for signals in compressed domain (ICA yielded an mean of F1 = 92.5%, while NMF yielded F1 = 24%).
• NMF was tested in combination with the EMD method by the authors in [19]. The EMD method was applied on the aECG input signal, which decomposed it into multiple intrinsic mode functions. A non-negative matrix was created based on the extracted intrinsic mode functions and the number of estimated independent components, to which NMF was then applied and separated mECG and fECG. The method was tested on real and synthetic records and evaluated based on SNR and visual comparison of the extracted signals. With the help of the designed EMD-NMF method, higher SNR and more effective fECG extraction were achieved compared to the single channel ICA and combination of WT-ICA.
Based on the conducted review of the state-of-the-art BSS methods used in fetal ECG extraction, the ICA-based methods appeared to be the most prevalent and also most promising for NI-fECG processing, at least as one of its steps. Often, these algorithms are combined with other methods, such adaptive algorithms [15,16]. However, there are many variants of the ICA method (FastICA, JADE, SOBI, AMUSE, etc.). Therefore, before implementing newer techniques, it is necessary to examine these conventional methods in detail and determine their strengths and weaknesses so that the new methods can be compared with the conventional ones and see whether their application provides improvement or not. Currently, there is no extensive objective comparison of various ICA-based methods in the field of NI-fECG, and it is therefore not clear which of the ICA algorithms is the most suitable for this task.
The goal of this study is therefore to test and objectively evaluate eleven ICA methods in terms of their performance and computational speed. The tests will include following ICA variants: AMUSE algorithm, equivariant robust ICA algorithm (ERICA), FastICA algorithm, flexible ICA algorithm (FlexICA), logistic infomax ICA algorithm (Infomax), JADE, kernel ICA algorithm (KICA), robust accurate direct ICA algorithm (RADICAL), robust ICA algorithm (RobustICA), simultaneous blind signal extraction using cumulants (SIMBEC), SOBI algorithm.

Material and methods
This chapter provides basic description of the tested ICA-based method and adaptive FTF algorithm. The real datasets used for objective evaluation and validation of function of the applied algorithms will be described later in this chapter. Fig 1 shows in block diagram an example of the extraction procedure for recording r9 of the Pregnancy dataset to illustrate our experiments. In this figure, we provide examples of preprocessed input aECG signals by finite impulse response (FIR) filter. Second example shows aECG* and mECG components estimated by the ICA method. In the aECG* signal, the fetal component is enhanced and and thus more prominent. Third example then represents the resulting fECG signal, obtained after modification of the mECG signal with FTF algorithm and subtraction of the modified mECG signal from the aECG* signal.

Independent component analysis based methods
Independent component analysis is a method that aims to find linear representation of non-Gaussian data so that the components are statistically independent or as independent as possible. Such representation captures the basic structure of data in many applications, including the extraction of functions and signal separation [27].
Before applying the ICA method to aECG input signals, it is often wise to pre-process the signals by centering and whitening. After centering, input signals have zero mean value, whereas whitening makes them discorrelated with the unit variance [27].
For basic motivation, consider a signal sensed from an abdominal area of a pregnant woman (aECG) with multiple signal sources, such as fECG, mECG, electrical activity of the uterus and other muscles (manifesting as electrohysterogram (EHG) and electromyogram (EMG), respectively) and noise caused by the ambient environment. For further description, these sources are defined ass and their acquisition from the abdominal area creates mixed signalsx, where each source needs to be separated. Standard linear model of the ICA method can be described by Eq (2), wherex represents a mix of signals consisting of mutually independent sourcess. The number of signals (measuring electrodes)x must not be lower than the number of source signalss. Mixing matrix A is unknown; therefore, the goal of ICA method is to estimate the inverse of demixing matrix W. Eq (3) describes the estimation of output independent componentsỹ using the estimated demixing matrix W. Each ICA-based method differs mainly in the manner of making the matrix estimate [27].
There are many different ICA-based methods. In this study, the methods described below were used for the experiment (official versions of each algorithm provided by the creators of the algorithms were used to increase the reproducibility of the experiments): • AMUSE algorithm is using eigenvalue value decomposition. This is an algorithm of blind separation of second-order sources, which uses the structure in data to find non-correlating components. To do so, it decomposes singular values on a shifted cross-covariance matrix. The shift is selected so that the auto-correlation of sources in these shifts are non-zero and as different from each other as possible. The AMUSE algorithm consists of several steps. In the first step, data are gathered and a covariance matrix is estimated. Then follows a decomposition of singular values of the covariance matrix, estimation of number of sources, and dispersion of noise. Data are then transformed. The next step is eigenvalue/eigenvector decomposition. The source signal is estimated, followed by estimating the actual channel parameter matrix. More information and mathematical description could be found in [28].
• ERICA algorithm separates signals with non-zero kurtosis from mixed signals in the present of Gaussian noise. This algorithm is a quasi-Newton iteration, which will converge to saddle point with local isotropic convergence, regardless of source distribution. Whitening is not necessary for the algorithm to converge. The algorithm uses an iteration algorithm using a mixing matrix estimate, learning rate, and a matrix of fourth-order cross-cumulants for identifying the unknown of the mixing matrix. Demixing matrix is then computed as a pseudo-inverted matrix A. The iteration process continues until the selected convergence rate is achieved. More information and mathematical description can be found in [29].
• FastICA algorithm is currently the most commonly used type of the ICA method. It is built on a fixed iteration plan for finding the maximum data values that do not originate from normal data distribution. This can be achieved by using the approximation of Newton iteration. Pre-processing by centering and whitening must be done before applying the FastICA algorithm. Convergence criterion and maximum number of iterations must be set. The goal of the convergence is to achieve practically zero scalar product between the old and new vector values. First, random normed initial weights of the vector are created. A new vector is then calculated using the kurtosis and negentropy. This is followed by norming and checking whether the scalar product of the new vector and the initial weights vector is lower than the selected convergence criterion. If that is not the case, the second and third step of the Fas-tICA method is repeated until the convergence criterion condition is satisfied or the selected number of iterations is exceeded [27].
• FlexICA algorithm is a learning algorithm with flexible linearity. This algorithm relies on hypothesized density functions since the probability density functions of source signals are unknown. It uses generalized Gaussian density, allowing approximation of all unimodal distributions. Simply speaking, FlexICA uses a combination of kurtosis and Gaussian exponentials to select the correct value of Gaussian exponentials. It can be defined as follows. As the first step, dimension is reduced by whitening. Then follows a search for orthogonality factor for minimization of common information in the whitened vector. This factor is then used to estimate the sought matrix W. More information and mathematical description can be found in [30].
• Infomax algorithm is based on maximizing the entropy and represents a natural gradient form for computing independent components. It can be considered as a neural network learning method. Infomax algorithm uses annealing based on weight changes to automate the separation process. The algorithm basically maps input values to output values so that the mean information between the input and output values is maximized. Learning rate function and function related to the nature of distribution are selected during the computation (i.e. super-Gaussian or sub-Gaussian). The output is a demixing matrix that is inverted mixing matrix. More information and mathematical description can be found in [31].
• JADE algorithm is based on common diagonalization of cumulation matrices. This algorithm is highly effective for separating low number of sources. Output components are separated by exploiting fourth-order moments, where orthogonal rotation of input signal is searched for the purpose of estimating source signals with high kurtosis. Firstly, the JADE algorithm estimates a whitening matrix. This is followed by estimating a maximum set of cumulation matrices. Orthogonal contrast is optimized by finding a rotation matrix so that the cumulation matrices are as diagonal as possible. Finally, the mixing matrix or direct output components (source signals) are estimated. More information and mathematical description can be found in [22].
• KICA algorithm is based on maximizing the kurtosis or, to be more precise, on optimization of kurtosis-based cost function. This cost function is identical to the function applied in the FastICA algorithm. The first step is centering and whitening of data. This is followed by initialization of a separation matrix and setting the required parameters. A rotation matrix is then created, followed by a computation of estimate component output and update of the demixing matrix. The components are estimated and the demixing matrix is updated by iterations. More information and mathematical description can be found in [32].
• RADICAL algorithm solves the ICA problem in arbitrary dimension. The RADICAL algorithm extracts independent signal sources using a differential entropy estimate based on spacing estimator. It is a consistent, asymptotically efficient, and computationally undemanding algorithm. The goal of the RADICAL algorithm is to minimize the contrast function using the Jacobean matrix, incorporated into the rotation matrix. The resulting rotation matrix contains all Jacobi rotations for all rotated pairs. This process is done for all sources estimated using the demixing matrix. The data must be whitened before applying the RADI-CAL. The advantage is its direct optimization of the statistic independence rate, absence of estimate of probability density and that it carries out one-dimensional entropy estimation, which converges quickly and avoids remote values. More information and mathematical description can be found in [33].
• Robust ICA algorithm is based on exact line search of optimal kurtosis. In this algorithm, the step size is optimized for each iteration in order to achieve kurtosis maximization and reduction of computational complexity. It is a simple modification of FastICA algorithm, where exact line search is in contract of the kurtosis. Several steps are taken during each iteration of optimal step-size optimization. Optimal step-size polynomial coefficients are computed in the first step, followed by extraction of step-size polynomial roots. In the next step, root leading to the absolute maximum is selected. The demixing matrix is then updated and normalization is performed. More information and mathematical description can be found in [34].
• SIMBEC algorithm uses simultaneous extraction of independent components that are present in the given data. The algorithm optimizes the maximum credibility criterion using a gradient algorithm on a Stiefel model. It uses natural gradient rise of Stiefel model to concurrently extract sources using a contrast function based on higher-order cumulants with a rate of learning that provides quick convergence. The algorithm is described by the following procedure. First, a separation matrix is computed. This is followed by a computation of cross-cumulant matrix. Finally, the searched source signals are estimated. More information and mathematical description can be found in [35].
• SOBI algorithm is based on second-order statistics for use of time-correlation structure to estimate of the original signals. The primary concept of SOBI is an assumption of diagonal form of delayed correlation matrices. This allows approximating the original signals. The input signals are whitened first, followed by sample covariance using input data and diagonalization. Noise power is then estimated by averaging the smallest eigenvalues in whitening matrix. The next step is to estimate unitary matrix in the joint diagonalization. Finally, the resulting matrix A is computed and source signals are estimated. More information and mathematical description can be found in [36].

Fast transversal filter
The FTF is an adaptive filter that automatically modified filter coefficients so that the filter converges to optimal condition. This condition is achieved by minimizing the error signal between the adaptive filter output and the required signal [37]. In this case, mECG and aECG* signals estimated by the ICA method were used as inputs. The aECG* signal was considered as the desired signaldðnÞ and mECG signal was modified by adaptive filter into the form of maternal component in the aECG* signal. Such FTF-modified mECG signal was designated as yðnÞ. By subtracting the signalỹðnÞ fromdðnÞ, fECG considered as the error signalẽðnÞ was found:ẽ ðnÞ ¼dðnÞ ÀỹðnÞ: ð4Þ In case of FTF algorithm (as with recursive least squares algorithm), the error function is optimized by a deterministic approach. The main benefit of the FTF filter is its comparable performance with the recursive least squares method, while having shorter computation times (especially in higher orders of the filter). This is achieved by implementing four filters (forward prediction transversal filter, the backward prediction transversal filter, the gain computation transversal filter and the joint-process estimation transversal filter) operating together on a single task. However, this algorithm is burdened with instability in a finite precision environment. Description of this algorithm is rather comprehensive; see [37] for more detailed information.

Dataset
All records of publicly available datasets (Labour and Pregnancy) were used for the experiments. The study protocol was approved by the Ethical Committee of the Silesian Medical University, Katowice, Poland (NN-013-345/02). Subjects read the approved consent form and gave written informed consent to participate in the study. We decided not to include other datasets because there are currently no sufficiently high-quality datasets on which it would be possible to objectively evaluate the quality of fECG extraction. Another reason is the need for expert-verified reference annotations so that the data can be properly validated. The selection of datasets was conditioned by the following criteria: • Dataset including real data-the use of synthetic data for the experiments is not suitable because the algorithms often tend to work effectively, but when it is applied to real data, the algorithm often fails. Synthetically generated signals are therefore only suitable for initial experiments, not for the validation phase.
• Availability of the fQRS annotations-for an objective evaluation of the effectiveness of the algorithm, it is necessary to evaluate the correctness of determining the positions of fQRS complexes and estimating fHR. Without reference positions verified by experts, it is not possible to carry out objective assessment (using state-of-the-art evaluation metrics), and the effectiveness of the filtration can then only be evaluated subjectively, which is not very accurate.
• Sufficient length and diversity of the recordings-it is also important to use sufficiently long recordings (at least five minutes long) for testing. Testing only a short section of the signal can lead to distorted results. It is also important to include recordings of fetuses of different gestational ages, different ratios of mQRS:fQRS amplitudes and with different types and levels of interference. In addition to physiological records, it would be appropriate to use pathological records as well.
Although the availability of just such datasets is generally the biggest obstacle when testing algorithms for fECG extraction, the Labour and Pregnancy datasets come closest to the aforementioned requirements. Other publicly available datasets either contain synthetic signals, do not contain the reference positions of fQRS complexes, or contain very short and low-quality signals. All signals from Labour and Pregnancy dataset were acquired under clinical condition as part of research studies at the Department of Obstetrics and Gynaecology of the Medical University of Silesia in Katowice, Poland. The research was approved by the competent University Bioethics Committee and by participating patients. Records of both datasets are publicly available at figshare repository. All dataset information were acquired from [38].
In both datasets aECG signals have been sensed from the mother's abdomen by means of four conventional Ag/AgCl electrodes and recorded using he KOMPOREL system consisting of signal recorder module and a portable computer. A conductive paste was applied on the upper epidermis layer before attaching the electrodes. The electrodes were attached around the mother's navel line, common reference was placed above the pubic symphysis, and the electrode with active-ground signal was placed on the mother's left leg. Direct fECG signal was sensed from the fetus' head with a sterile spiral electrode. The signals were converted to a digital format with 16-bit resolution and sampling frequency of 500 Hz for aECG signals and of 1000 Hz for direct fECG. All sensed aECG signals have been pre-processed using a simple filter with multiple notches, located every 50 Hz. Low-frequency interference has been eliminated by setting the first frequency threshold to approx. 5 Hz; top-band between 45 Hz and 55 Hz was set to remove power-line interference. In direct fECG, the noise was suppressed in a comparable manner, but with modified filter parameters according to the sampling frequency [38].
• Labour dataset-the first used dataset contains twelve 5-minute long records of women between their 38th and 42nd week of pregnancy sensed in an advanced stage of labor. Each recording contains four aECG signals and includes a direct fECG signal sensed concurrently from the fetus' head using a scalp electrode. Direct fECG gives the most reliable information about fHR and was therefore considered as reference or 'gold standard'. In addition, the dataset includes annotation with precise positions of fQRS complexes determined by the authors to automatically detect R-peaks in direct fECG signal, correctness of which has been validated by clinical experts.
• Pregnancy dataset-the second used dataset contains ten 20-minute long records of women between their 32nd and 42nd week of pregnancy. Each recording also contains four aECG signals; in this case however, no reference direct fECG was sensed as the measurement using a scalp electrode can be done after the rupture of amniotic membrane. Nevertheless, in order to be able to use the signals for testing and evaluation the extraction algorithms, the authors processed and analyzed aECG signals and provided annotations with reference positions of mQRS and fQRS complexes. The correctness of fQRS complex position was again manually checked and corrected by clinical experts. Each fQRS complex was assigned with a reliability flag. Flag 1 means that the fQRS complex position was verified by a clinical expert, while flag 0 means that the position of the fQRS complex could not be verified by a clinical expert due to high levels of signal interference (number of unverified fQRS complexes was 130, i.e. 0.46% of the total number of marked fQRS complexes). The authors recommend excluding these unreliable fQRS complex positions when evaluating the detection accuracy.
The datasets used contain records acquired both during pregnancy and during childbirth. Therefore, they contain diverse signals varying in their quality, magnitudes of maternal and fetal components, as well as the interferences present. In the Labour dataset, the signals were recorded with a more prominent fetal component, while in the Pregnancy dataset, the fetal component was significantly lower compared to the maternal one. For example, for the Labour dataset, the mQRS:fQRS ratio was on average equal to 2 whereas for the Pregnancy dataset, the average mQRS:fQRS ratio was 3.5. Different levels and types of noise were present in both datasets. These were mainly power line interference, but also low-frequency interferences caused by movements (both maternal and fetal) or uterine contractions, as well as by impedance changes between the electrodes and maternal skin. Since the quality of individual records varied and cannot be generalized for the entire dataset, we present four parameters characterizing its quality for each record: • Parametr WM-determining the mECG signal level in relation to interferences.
• Parametr WF-determining the fECG signal level in relation to interferences.
The WM and WF parameters assess the relationships of the amplitudes of the mECG and fECG signals in relation to the interferences in dB units. As for the WEM and WEF parameters, if there are no changes in the amplitude or shape of the individual mQRS/fQRS complexes, the values of the WEM and WEF parameters are equal to zero. The values of the WEM and WEF indices are also affected by the interference level. If the interference amplitude is comparable to the mQRS/fQRS amplitude, the WEF and WEF values take it into account. Detailed information about these parameters, including mathematical formulas for their calculation, can be found in the original publication by Matonia et al. [38]. The corresponding values of all selected parameters are shown for the Labour and Pregnancy dataset in Table 1. All available records (12 records from the Labour dataset and 10 records from the Pregnancy dataset) were used for our experiments and none of them were excluded.

Evaluation parameters
It is highly desirable to maintain the same process of objective evaluation of extracted signals in order to be able to compare individual studies without having to repeat a certain experiment using already tested methods. Therefore, objective evaluation in this study is based on computation of R-peaks detection accuracy (ACC). This parameter is very often determined in various publications addressing fECG signal extraction and determination of R-peak positions, for example [18,20]. Our team of authors focuses primarily on evaluations made with this parameter in large number of already published papers [15,16].
In order to be able to compute the ACC parameter, the fECG signal must be first extracted and the position of R-peaks in it must be estimated. Additionally, the tested dataset must contain reference annotation of correct positions of R-peaks determined by the experts. The next step is to determine the true positive (TP), false positive (FP) and false negative (FN) parameters. Detected R-peaks in the extracted signal, which ranges within the interval of ±50 ms from the reference annotations, are marked as TP. Detected R-peaks in the extracted signal, which fall outside the mentioned interval, are marked as FP. And finally, any emitted Rpeaks that should have been detected in the said interval, but were missing, are marked as FN. After the TP, FP, and FN parameters are determined, Eq (5) can be used to determine ACC, Eq (6) can be used to determine SE, Eq (7) to determine PPV, and finally Eq (8) to determine F1 [39].

Experimental setup
Since ICA-based methods are multi-channel, at least two aECG signals were needed on the input side. It was found out based on our previous research [15,16] that some aECG signals are not sensed with sufficient quality, which may lead to a needless deterioration of algorithm performance. For that reason, only 2 or 3 aECG input signals were used for further processing in some recordings (detailed description of the procedure for selecting optimal combinations of aECG signals for each recording is available for example in [15,16]). For clarity reasons, the whole signal processing procedure is summarized in the following steps: 1. Pre-processing of aECG signals using a FIR filter to eliminate isoline fluctuations. Threshold frequencies were set to 5 through 50 Hz, the filter order was 500. The lower cutoff frequency of 5 Hz is chosen because the authors of the dataset removed frequencies below 5 Hz. The upper cutoff frequency was chosen at 50 Hz because the focus in this work is on fetal QRS complexes, which lie predominantly at frequencies 10-15 Hz [1].
2. Estimate of individual components (noise, mECG and aECG* signals with enhanced fetal component) from a mix of aECG signals using ICA-based methods. The number of output components was set to match the number of aECG input signals for the specific recording.
3. Automatic choice of aECG* and mECG together with amplitude/time centering. After receiving the output components from the ICA method, QRS complexes are detected in all output components. Based on the amplitudes of the QRS complexes, the polarity of the signals is adjusted. Subsequently, the one with the smallest number of QRS complexes and meeting the condition of 60-100 bpm is selected as the mECG signal. The remaining two output components are compared with each other using the SNR, where the signal with the higher SNR is designated as aECG*. Time centering is then performed, i.e. synchronization of the mECG and aECG* based on the positions of the detected QRS complexes. Finally, based on the amplitudes of the detected QRS complexes, the signals are equalized in amplitude. Both centerings are applied to increase the efficiency of the FTF algorithm.
4. Application of FTF algorithm used to modify the mECG signal in order for it to correspond to the mECG component in the aECG* signal as much as possible. The modified mECG signal was subtracted from aECG*, leading to fECG extraction.
5. Detection of R-peak in the resulting fECG signal using CWT detector [40]. The last step of the detector includes algorithm to modify the detected R-peaks based on 3 rules in accordance with patent published in [41]: (a) Missing R-peak is added if the current RR interval > 1.3 times the median of all RR intervals. In such case, the missing R-peak is added right in the middle between adjacent R-peaks.
(b) Incorrectly detected R-peak is removed if the current RR interval < 0.7 times the median of all RR intervals. In such case, the excess R-peak is removed.
(c) R-peak is determined as incorrectly positioned if the RR i interval < 0.9 times the median of RR intervals and concurrently RR i +1 interval > 1.1 times the median of RR intervals. This goes vice versa-the RR i interval > 1.1 times the median of RR intervals and concurrently RR i +1 interval < 0.9 times the median of RR intervals. In such case, the incorrect position of QRS complex shifted right in the middle between adjacent Rpeaks.
The example of detected R-peak in recording r6 of the Labour dataset compared to direct fECG with highlighted reference positions of R-peaks by annotations is shown in Fig 2. The direct fECG signal could not be acquired in Pregnancy dataset. In example, the grey dashed lines represent the ±50 ms interval from reference annotations for marking the detected R-peak as a TP value.

Results
This section summarizes the results of fQRS complex detection achieved by each ICA-based method and subsequent application of an adaptive FTF filter. The ICA-based methods were also compared in terms of time complexity. In this section, a statistical analysis is presented to determine whether the differences between the tested algorithms are statistically significant, both in terms of the quality of filtration they are providing, and in terms of their time consumption. All data used and analysed are available as supporting material to this study.

Accuracy of fQRS complex detection
The resulting ACC values acquired with each ICA-based method and the follow-up application of FTF are summarized for Labour dataset in Table 2. According to our subjective  assessment, the performance of individual algorithms was very similar for some records but differed for the others. For the Labour dataset, six records (r1, r5, r6, r8, r11 and r12) achieved very high accuracies using all tested methods, and the ACC value did not differ by more than 6%, which can be considered a negligible difference. These were high-quality captured aECG signals and extraction was not difficult for any algorithm. For the r2 recording, the results of all algorithms were similar, except for the RobustICA and ERICA algorithms, which had the largest percentage difference (ERICA achieved 11.86% lower transmission than RobustICA), which was due to ERICA's inability to accurately detect fQRS complexes. On the other hand, in the case of record r3, the RobustICA method achieved the worst result (28.94%). Since this was a signal of lower quality, the method probably failed to recognize the fetal component and considered it as interference. The KICA method coped best with this problematic signal and can therefore be recommended for processing both high-quality and low-quality input signals. RobustICA, together with ERICA, also failed for recording r4, in which they also failed to correctly recognize the fetal component and labeled it as interference, and considered the mQRS residue or other interference as a fetal component and labeled it as fQRS. A difference of 29.06% between the most efficient method (RADICAL) and the worst method (SIMBEC) was observed for the r7 record, for which the SIMBEC method was unable to suppress residues of mQRS complexes. The same problem also occurred with the r9 record, where the JADE method failed compared to the RADICAL method, resulting in an accuracy difference of 11.98%. For the r10 record, the difference in accuracy between the best (RADICAL) and the worst (SIMBEC) method was even higher, at 22.25%. As for the average values, in all cases exceeded 80% (i.e. mean ACC values of all methods ranged between 84.95% and 91.41%), and thus all tested algorithms achieved effective extraction in this dataset. When assessing the effectivity of the algorithms based on the mean of the ACC values over all recordings, the most effective algorithm is the RADICAL (91.41%), followed by FlexICA, SOBI and KICA methods that reached mean accuracy of over 90% (91.23%, 90.88%, and 90.02%, respectively). As the least effective methods, we can consider the methods with the mean accuracy below 90%, i.e. ERICA, Robust ICA, and SIMBEC, reaching 84.95%, 85%, and 86.93%, respectively.
For a detailed statistic results (including TP, FP, FN values and ACC, SE, PPV, F1 indices) achieved by the most effective RADICAL method for each recording, see Table 3. The table includes a combination of the used aECG input signals. It shows that the method was effective from the perspective of ACC (ACC > 80%) in most recordings, except r3 and r4. SE, PPV, and Table 3 Table 4. For this dataset, the algorithms achieved high quality results only for four records (r1, r4, r5, and r10), while their ACC values did not differ by more than 8.13%. For the r2 record, the results of the individual algorithms were comparable, but a significant difference in accuracy (26.48%) was achieved with the RADICAL and RobustICA methods. The reason was that the RobustICA was not able to correctly recognize the fetal component and labeled the interference or the maternal component as fQRS complexes. The same case also occurred with records r3, r7, and r9, where one method deviated. For the r3 record, it was the SIMBEC method with a difference of 21.62% compared to the most effective KICA. For the r7 record, the least effective AMUSE algorithm differed from the most effective one (ERICA) by 21.23%. As for the r9 record, it was ERICA achieving lower accuracy by up to 36.42% compared to KICA, the most effective algorithm for this record. On the other hand, for records r6 and r8, almost all methods achieved very inaccurate extraction (for r6 the ACC values were in the range 14.47%-59.08% and for r8 in the range 24.17%-66.56%), which was due to the low quality of input aECG signals. In the case of the r6 recording, the RobustICA method was able to best distinguish noise from the useful fECG signal while in the case of the r8 recording, it was the FastICA method.

Record Combination of electrodes Number of R-peaks by annotations
The extraction accuracy was generally lower in this dataset, as the mean ACC values ranged between 77.34%-84.14%, and 6 variants of the ICA method (AMUSE, FlexICA, Infomax, RADICAL, SIMBEC, SOBI) did not exceed the mean ACC of 80%. JADE and SOBI methods were not the most accurate in fQRS complex detection in none of the recordings and had mean values of ACC = 80.59% and ACC = 77.35%, respectively. AMUSE, ERICA, FlexICA, Infomax, and SIMBEC algorithms achieved the most accurate detection of fQRS complexes with 1 recording with mean values of ACC = 77.34%, ACC = 80.65%, ACC = 79.18%, ACC = 79%, and ACC = 78.84%, respectively. The most effective methods were KICA, RADI-CAL, and RobustICA, which achieved the most accurate detection of fQRS complexes with 2 recordings with mean values of ACC = 82.15%, ACC = 78.70% and ACC = 80.54%, respectively. FastICA turned out to be the most suitable for Pregnancy dataset; even though it achieved the most accurate detection of fQRS complexes with 1 recording only, it had the highest mean value of ACC (ACC = 84.14%). Even in this dataset, detailed statistical results with all recordings achieved with the most effective method, FastICA, are provided; see Table 5. From the perspective of the ACC parameter, the algorithm was effective (ACC > 80%) in most of the recordings, with the exception of r6, r7, and r8. The SE value exceeded 80% in all recordings except r6 and PPV, and F1 parameter values were higher than 80% with most recordings except r6 and r8. In this case, efforts to correctly detect all fQRS complexes failed in all recordings. In other words, a 100.00% accuracy was not achieved in none of the recordings.

Record
The main reason for the detection of the fQRS complexes is their use to determine fHR, which is then used by the clinicians to determine fetal health state. Therefore, we provide an example of a fHR trace created from signals extracted with the FastICA-FTF compared to the reference fHR trace for all recordings of the Labour dataset (Fig 3a) and Pregnancy dataset (Fig 3b) is shown in Fig 3. A moving average with a window length of 15 was applied to the reference and extracted fHR signal. This is because usually the fHR of the extracted signal is not so accurate and tends to produce positive or negative peaks. In the case of Labour dataset, fHR traces were estimated with FastICA-FTF method in all recordings, except recording r3 comparable to the reference fHR traces. In case of Pregnancy dataset, the trend of reference fHR traces was followed in all recordings, with the exception of recordings r6, r7, r8, and r10, the estimated fHR traces slightly differed from the references. Less accurate determination of the fHR traces in records of both datasets were caused by low-quality aECG signals. It would also be beneficial if measuring system would use abdominal electrodes only, i.e. without the need to sense the reference chest signals, which would be comfortable and stress-reducing for the mother.

Evaluation of algorithms time complexity
In addition to evaluation of algorithms from the perspective of extraction quality, we compared each ICA-based method from the perspective of computation time; see Table 6. In order for this experiment to be objective, all algorithms were run on the same PC with the following configuration: Core i9 9980XE (18c/36t), MEG X299 CREATION, 64GB DDR4 3200MHz cl14, GTX 1050 TI. Only one test of ICA-based method was running at one time and no other operations were running in the background that would reduce the performance (only basic applications were running in the background without any greater effect on the performance). Further, for each ICA-based method, we performed the tests on each record for all possible combinations of 2 inputs (6 combinations), then 3 inputs (4 combinations), and finally 4 inputs (1 combination). We then performed the median of all estimated times for 2 inputs, 3 inputs and 4 inputs. For each input count option, on average the experiment was run more than 40 times (depending on the number of records in the dataset and the combination). The effect of number of input signals to each ICA-based method was also observed and this experiment was made separately on both tested datasets so that the effect of the different number of input signal samples can be seen.  According to the recorded computation times shown in Table 6, the fastest method was the KICA method with mean computation time of 0.02 s. The slowest were the Infomax and RAD-ICAL method with the mean computation time exceeding 20 s. The FastICA method, which achieved the most effective extraction, was the sixth fastest with mean time of 0.452 s. However, the difference between the computation time of the FastICA method and the fastest AMUSE and KICA methods was merely several tenths of a second, which can be deemed insignificant. When comparing the computation times of the FastICA and the RADICAL method, which also achieved very good extraction results, the FastICA can be considered superior with its almost fifty times shorter computation time compared to the RADICAL. In addition, the computation time of the RADICAL method was substantially rising with the increasing number of aECG input signals. This might obstruct efforts to implement this method into a device operating in real-time, which often utilizes higher number (up to tens) of sensing electrodes.

Statistical analysis
In order to find out whether the differences between the tested algorithms are statistically significant, we performed a) a statistical analysis of the results achieved for all used parameters evaluating the quality of filtration (ACC, SE, PPV and F1) and b) a statistical analysis of the time consumption of individual algorithms. Statistical analysis was performed using R Core Team. In all cases, statistical significance was set as p < 0.05.
First, the normality of the data was tested using the Shapiro-Wilk test for all parameters evaluating the quality of filtration, as well as for the time requirements for each algorithm. In some cases, statistically significant deviations from normality were detected, and therefore non-parametric methods were chosen for data description and subsequent analysis. The median and interquartile range (IQR) were used to describe the analyzed variables. A detailed analysis of the filtration quality results was performed for individual evaluation parameters and individual datasets. However, the results did not show significant differences between the given results and for this reason only the summary analysis for both datasets and one selected parameter (ACC) is presented. A deeper analysis, separately for the Labour dataset and separately for the Pregnancy dataset, is then devoted to the analysis of the time requirements of the tested algorithms, where statistically significant differences were found.
To statistically evaluate the differences between the tested algorithms in terms of their filtration outputs and the computational demands, we used the Friedman test, which is the nonparametric alternative to the one-way ANOVA with repeated measures, supplemented with the Kendall concordance coefficient. The Kendall concordance coefficient expresses the simultaneous association (relatedness) between k sets of rankings (i.e., cases; correlated samples). The range of Kendall concordance is from 0 to +1. Values close to zero represent lack of agreement in the rankings of the algorithms among records, while values close to 1 represent perfect agreement in the rankings of the algorithms among records. In case of the Friedman test detecting a statistically significant difference between individual algorithms, a Conover posthoc analysis with Benjamini & Yekutieli correction was performed to calculate dusted p-values for the detection of homogeneous subgroups of algorithms.

Detection accuracy results
When comparing parameters evaluating the quality of filtering records (ACC, SE, PPV, F1) across both datasets, no statistically significant differences between the tested algorithms were found (in all cases p-value > 0.05), see Table 7. Therefore, we decided to further analyze the results of the comparison results only for one of these parameters, namely ACC, see It can be noted that for ACC there are outliers for all algorithms, while they all belong to the same records: for the Labour dataset they are records r3 and r4, and for the Pregnancy dataset they are records r6 and r8. These deviations are probably related to the reduced quality of the input signals.

Time consumption analysis results
In the next step, we analyzed the time consumption of the tested algorithms. For each algorithm, the time needed to create ICA components was measured with different numbers of inputs (for example, 3 tested scenarios: 2 inputs, 3 inputs, 4 inputs). This methodology was selected because the number of inputs significantly affects the quality of the extraction and the computational complexity. The results showed a statistically significant difference between the calculation times for the individual algorithms for all tested scenarios (in all cases the p-value for both the Labour dataset and the Pregnancy dataset was < 0.001), see Tables 8 and 9, respectively. Fig 5 shows a comparison of the required times of individual algorithms using hybrid boxplots. The comparison was made separately for individual numbers of inputs and also separately for the Labour dataset and the Pregnancy dataset. In all cases, it can be seen that the Infomax and RADICAL algorithms show noticeably longer times, which was also confirmed by the results of the Conover post-hoc analysis, which always included these algorithms in the group of the slowest algorithms. In contrast, the AMUSE, KICA, and JADE algorithms appear in most groups associated with the shortest times. Furthermore, there is a noticeable influence of the input data, where the results show longer calculation times for records from the   6a and 6c, it can be seen that for good results, the line indicating the average value of μ was practically at 0 and there was a low value of 1.96σ, which indicates the minimum difference between the reference and filtered fHR curves. For bad results in Fig 6b and 6d, on the other hand, one can see a large shift of the line indicating the average value of μ deviating from the value 0 and a high value of 1.96σ indicating a large difference between the reference and filtered fHR trace. For a better overview, the μ and 1.96σ values obtained for FastICA-FTF method on all data (all records from both datasets) can be seen in Table 10.

Analysis of the results
The results presented in the previous section showed that ICA-based methods combined with an adaptive FTF algorithm are able to effectively extract non-invasively sensed fECG. However, there are visible differences in accuracy when detecting fQRS complexes, namely 1)  between individual ICA-based methods, 2) between individual recordings, and 3) between mean results of both datasets.
1. Influence of algorithm selection-the results of the statistical analysis, showed no statistically significant difference between the individual algorithms in terms of the accuracy they provide, only in the time needed for their calculations. However, there are visible differences between individual algorithms in some records. To show why each algorithm achieved different results with the same recording, we decided to compare the waveforms of the  (ACC = 95.07%). AMUSE and KICA were also able to eliminate the mQRS complexes well. However, along with the maternal component, they slightly suppressed the fQRS complexes as well, which lead to less accurate extraction (ACC = 87% and ACC = 82.16%, respectively). The SIMBEC method was the least successful of all methods in suppressing the mQRS complexes. In addition, the fetal component was reaching low levels compared to the maternal one, which led to low accuracy of fQRS complex detection (ACC = 69.01%). Nevertheless, in the case of recording r4 in the Pregnancy dataset (see Fig 5b), mQRS complexes were sufficiently suppressed by all methods, which led to effective extraction (ACC > 93%) by all methods. Visual comparison shows that the mQRS complex residues were best suppressed by the FastICA method, which had no substantial effect on the resulting extraction accuracy. The least accurate methods with this recording were RobustICA (ACC = 93.85%) and KICA (ACC = 94.16%), which failed to eliminate some of the residues of maternal components and the amplitude of some of the fQRS complexes were lower compared to the mQRS complex amplitude. It should be noted that the low average accuracy of the RADICAL method on the Pregnancy dataset is caused mainly due to one extremely low outcome on record r6, where the dropped below 20% (ACC = 14.47%) and further low accuracy outcome of the record r8 (ACC = 44.56%). On the other hand, for the other records, this method worked similarly as the other algorithms. We can therefore conclude that the RADICAL method was the least robust algorithm in the experiments on low-quality data. Finally, as proved by the results of the statistical analysis, the choice of the algorithm does not significantly influence the results. However, what really counts is the setting of the algorithms used. For ICA methods, the key parameters to be set are the number of components, convergence criterion, number of iterations and so on (depending on the type of ICAbased method). For adaptive algorithms, these are mainly the filter order, convergence constant or forgetting factor [15,16]. We have found advantageous to use the optimization algorithms to find the optimal parameter setting instead of its manual selection in our previous studies [42,43]. This allows the algorithm to appropriately adjust the different parts of the hybrid system according to the week of gestation, the position of the fetus and other circumstances.
2. Influence of input signal quality-the differences in extraction accuracy with each recording is caused primarily by the quality of aECG input signals. Earlier studies [44] already noted the dependency of proper positioning of sensing electrodes, quality of the acquired aECG signals, and the final quality of fECG extraction. Recordings, where the achieved accuracy of fQRS complex detection was low, contained aECG input signals with substandard quality. This means that the fetal component level compared to the maternal one was very low, in some cases even invisible, and some of the signals contained noise. Effective extraction is almost impossible with such signals, and greater attention should be paid to the proper positioning of the sensing electrodes and the measurement system setting when acquiring these signals. An example of the effect of aECG signal quality on the quality of the extracted fECG signal is shown in Fig 8. Example Fig 8a and 8c represent high-quality aECG signals from recording r5 of the Labour dataset and recording r3 of the Pregnancy dataset, respectively. In both cases, the fetal component level is high enough with respect to the maternal one and the signals are not burdened with other interference. The extraction using these aECG signals was very accurate (ACC = 97.01% and ACC = 93.82%, respectively), as mECG was sufficiently suppressed and the fECG was effectively enhanced. In contrast, example Fig 8b and 8d present low-quality aECG signals from recording r3 of the Labour dataset and recording r7 of the Pregnancy dataset, respectively. The fetal component of the aECG signals is barely visible in these recordings, and some of the signals are burdened with interference. This has led to insufficiently accurate fECG extraction (ACC = 47.91% and ACC = 79.16%, respectively).
3. Influence of dataset used-comparison of the results of both datasets show that the mean accuracy achieved during fQRS complex detection in recordings from the Labour dataset was higher than in case of the Pregnancy dataset recordings (ACC = 84.14%). Based on the information from [38], in which the authors described and analyzed the datasets, along with our own findings, such difference could be caused by the following: • The first obvious reason for such differences is that in Labour dataset, the gestation age of the fetuses monitored was higher (38-42 weeks) in comparison to those in Pregnancy dataset (32-42 weeks), which generally means that the amplitude of the fetal ECG is higher and thus easier to extract [1,[45][46][47][48]. This is confirmed in the Labour dataset, where according to the information from [38], the mean mQRS:fQRS complex amplitude ratio was 2, while the ratio in Pregnancy dataset was about 3.5. A conclusion can be drawn from this that with lesser amplitude difference between the maternal and fetal component in Labour dataset, the algorithms were better able to suppress the maternal component and extract higher-quality fECG.
• In [38], the authors determined parameters describing changes to fQRS complex amplitudes. In the case of Labour dataset, amplitude levels were more stable compared to fQRS complex amplitudes in the Pregnancy dataset. Lower variance of fQRS complex amplitudes therefore allows acquiring more uniform detection function, which then leads to more accurate detection in the Labour dataset recordings.

Summary and discussion
Since not all ICA-based methods worked well for various signals, we decided to create a summarizing evaluation of the algorithms, see Table 11. The goal was to find which algorithm generally detects the most TP values and least FP and FN values, while achieving the highest mean values of ACC, SE, PPV, and F1 in both datasets. The highest number of correctly detected fQRS complexes, i.e. TP values, were acquired with the FastICA method. In addition, the FastICA method detected the lowest number of FR and FN values, which led to the most accurate detection of fQRS complexes according to all the applied parameters-ACC, SE, PPV and F1. Conversely, the lowest number of TP values and the highest number of FP and FN values were detected by the SOBI method. This method can therefore be considered as the least suitable for extracting non-invasive fECG. Table 12 provides the comparison of all tested ICA-based methods. For all tested algorithms, we provide their strengths and limitations in the first two columns of this table. The other columns indicate the order of the methods according to the average results achieved in this experiment for all the tested parameters for both datasets. The last column corresponds to the sum of the placements. The results of this comparison showed that the FastICA algorithm seems to be the best compromise from the ICA-based methods, when considering the extraction accuracy and computation speed. Also, the algorithm's strengths are numerous in comparison with the other methods. The disadvantage posed by the FastICA is that it changes the order of extracted components (aECG*, mECG, and noise) and signal amplitude. Therefore, their use the in automated devices would require development of a precise algorithm for automatic identification of the components.
Another problem may be the randomness effect, which causes different output components each time the ICA-based method is run. This means that an ICA-based method would have to be run several times to eliminate this randomness effect. The data used in this study was of good/medium quality, so this randomness effect was minimal, which is why we neglected it after the initial experiments. In these initial experiments, we tried to run individual algorithms repeatedly on the same recording and the same combinations of electrodes, and the randomness effect did not manifest itself significantly. In the case of low-quality signals, of course, the efficiency of the algorithms would drop sharply and the randomness effect could have a big influence. With low quality signals, where the fECG is almost invisible, it may even happen that the ICA-based method will not be able to extract the necessary output components at all, even with repeated execution. The problem of randomness effect was for example addressed in a study [49]. The future study would have to be carried out on a different dataset which would include more of a low-quality data where the randomness effect would manifest itself. This could mainly be the relatively new NInFEA dataset [50], where reference annotations are not yet available. This was crucial for our study and therefore we did not include this dataset herein. In addition, the PhysioNet Challenge 2013 dataset [51] could theoretically be used as it does contain reference annotations, but here we encountered the problem that part of the dataset is synthetic and it is not clearly noted which one. In order to make the fQRS complex detection even more accurate, post-processing algorithms (e.g. WT or EMD) could be tested in the future as they might improve the extraction quality. The future research should also focus on testing the proposed algorithm from the perspective of morphological analysis feasibility (such as ST segment analysis or QT interval analysis), which is very important for clinical practice. The ST segment analysis is a particularly important indicator of fetus' health condition and can be used to detect fetal hypoxia very accurately (more accurately than with conventional CTG). The robustness of the algorithm should be determined by means of testing pathological recordings and abnormality recordings in addition to physiology recordings.

Conclusion
This study provided a comparison of available ICA algorithms for the fECG signal processing. The results on two tested datasets, Pregnancy dataset and Labour dataset, showed superior results of the FastICA algorithm and the follow-up FTF application in terms of accuracy (ACC = 83.72%, SE = 92.13%, PPV = 90.16%, F1 = 91.14%). Moreover, the computation time needed was comparable with the other tested methods (mean of 0.452 s), allowing the algorithm implementation for the real-time applications. These results provide a significant first step towards creating a suitable hybrid method for NI-fECG extraction.